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The  next  generation  of  reentry  vehicles  is  envisioned  to  have  onboard  autonomous 
capability  of  real-time  trajectory  planning  to  provide  capability  of  responsive  launch  and 
delivering  payload  anywhere  with  precise  flight  termination.  This  capability  is  also  desired 
to  overcome,  if  possible,  in-flight  vehicle  damage  or  control  effector  failure  resulting  in 
degraded  vehicle  performance.  An  aerial  vehicle  is  modeled  as  a  nonlinear  multi-input- 
multi-output  (MIMO)  system.  An  ideal  optimal  trajectory  control  design  system  generates  a 
series  of  control  commands  to  achieve  a  desired  trajectory  under  various  disturbances  and 
vehicle  model  uncertainties  including  aerodynamic  perturbations  caused  by  geometric 
damage  to  the  vehicle.  Conventional  approaches  suffer  from  the  nonlinearity  of  the  MIMO 
system,  and  the  high-dimensionality  of  the  system  state  space.  In  this  paper,  we  apply  a 
Neural  Dynamic  Optimization  (NDO)  based  approach  to  overcome  these  difficulties.  The 
core  of  an  NDO  model  is  a  multilayer  perceptron  (MLP)  neural  network,  which  generates 
the  control  parameters  online.  The  advantage  of  the  NDO  system  is  that  it  is  very  fast  and 
gives  the  trajectory  almost  instantaneously.  The  bulk  of  the  time  consuming  computation  is 
required  only  during  off-line  training.  The  inputs  of  the  MLP  are  the  time-variant  states  of 
the  MIMO  systems.  The  outputs  of  the  MLP  are  the  near  optimal  control  parameters. 


I.  Introduction 

THE  next  generation  of  reentry  vehicles  is  envisioned  to  have  onboard  autonomous  capability  of  real-time 
trajectory  planning  to  provide  capability  of  responsive  launch  and  delivering  payload  anywhere  with  precise 
flight  termination.  This  capability  is  also  desired  to  overcome,  if  possible,  in-flight  vehicle  damage  or  control 
effector  failure  resulting  in  degraded  vehicle  performance.  Determining  real-time  trajectory  for  highly  non-linear 
reentry  vehicles,  such  as  reusable  launch  vehicles  (RLVs),  and  trajectory  guidance  has  been  of  considerable  research 
interest5  in  the  recent  past.  Conventional  approach  for  mission  operation  for  highly  non-linear  systems  consists  of 
offline  reference  trajectory  design  and  onboard  tracking  of  the  reference  trajectory.  However,  in  case  the  vehicle’s 
dynamic  behavior  is  altered  significantly  due  to  damage/failure  to  the  vehicle  or  vehicle  sub-system,  a  pre-planned 
trajectory  may  cease  to  be  feasible,  possibly  resulting  in  a  catastrophic  failure  and  loss  of  vehicle.  The  traditional 
trajectory  design  approach  must  be  augmented  to  provide  real-time  trajectory  redesign  capability  particularly  for 
fully  autonomous  vehicles.  With  this  motivation  we  investigated  the  performance  of  a  neural  net  based  approach  for 
real-time  trajectory  design 

A  feasible  descent  trajectory  must  lie  within  an  entry  corridor  defined  by  the  path  constraints  based  on 
acceptable  limits  of  thermal,  structural  and  operational  constraints.  The  trajectory  design  system  determines  the 
necessary  commands  required  to  maneuver  the  vehicle  on  a  feasible  trajectory  to  accomplish  a  desired  objective. 
More  specifically,  given  an  initial  state  of  the  vehicle,  the  trajectory  design  system  generates  a  history  of  control 
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inputs  so  that  the  vehicle  can  reach  the  desired  final  boundary  conditions  while  imposing  all  the  trajectory 
constraints  so  that  the  designed  vehicle  trajectory  is  feasible. 

The  goal  of  this  paper  is  to  generate  an  approximate  optimal  trajectory  using  a  Neural  Dynamic  Optimization1 2  3 
(NDO)  based  approach  that  provides  an  approximate  technique  to  solve  the  dynamic  programming  (DP)  problem  for 
multi-input-multi-output  (MIMO)  discrete  systems.  Dynamic  programming  finds  an  optimal  feedback  solution  for 
the  trajectory  of  a  non-linear  MIMO  system.  However,  DP  solution  can  be  very  difficult  for  higher  order  systems 
and  impractical  for  real  time  applications.  In  NDO  approach,  a  more  practical  method  is  applied  to  handle  the 
complexities  of  the  non-linearity  of  MIMO  system.  It  leverages  the  advantage  of  neural  networks  in  the  framework 
of  optimal  control  theory  for  a  feed  back  solution  of  the  MIMO  control  systems. 

NDO  shows  distinct  advantages  compared  to  other  optimal  control  approaches  such  as  Feedforward  Optimal 
Control  (FOC)  and  Dynamic  Programming  (DP).  FOC  is  capable  of  finding  an  optimal  control  solution  relatively 
easily.  Its  computational  load  is  tractable.  However,  FOC  finds  just  a  single  trajectory  for  a  given  initial  state.  Thus, 
its  solution  is  very  sensitive  to  disturbances  and  cannot  handle  uncertainties.  DP  is  capable  of  providing  an  optimal 
solution  that  can  handle  disturbances  and  uncertainties.  However,  real-time  computation  and  storage  requirements 
associated  with  DP  solutions  can  be  challenging,  especially  for  high-order  nonlinear  systems.  NDO  overcomes  both 
problems  by  using  a  neural  network,  which  is  more  robust  to  uncertainties,  as  the  controller  and  provides  real-time 
numerical  computation  capability.  The  design  of  NDO  is  similar  to  FOC;  meaning  that  it  is  computationally 
tractable.  The  difference  is  that  a  FOC  approach  generates  the  trajectory  directly  based  on  the  initial  state,  while  an 
NDO  trains  a  neural  network  whose  responsibility  is  to  generate  the  trajectory  dynamically.  Interestingly,  under 
certain  cases,  NDO  is  equivalent  to  FOC  and  DP  respectively. 

The  core  of  an  NDO  model  is  a  multilayer  perceptron  (MLP)  neural  network,  which  generates  the  control  inputs 
for  a  non-linear  discrete  system.  The  inputs  of  the  MLP  are  the  time -varying  states  of  the  MIMO  systems  at  discrete 
times.  The  outputs  of  the  MLP,  the  control  parameters,  are  used  by  the  MIMO  to  generate  new  system  states.  By 
such  a  formulation,  an  NDO  model  can  approximate  the  time-varying  optimal  feedback  solution. 

In  our  approach,  we  model  the  reentry  vehicle  as  a  nonlinear  MIMO  hybrid  system4.  A  hybrid  system  is  loosely 
defined  as  a  system  that  involves  the  interaction  of  discrete  event  and  continuous  time  dynamics.  Hence,  to 
determine  the  values  of  system  states  x\k  +  l]  at  next  step  the  continuous  form  of  differential  equations  of  the  plant 
is  integrated  for  the  given  sampling  time,  starting  form  the  previous  state  x\k\.  The  control  inputs  u\k\  during  this 
integration  is  held  constant.  Hence,  the  inputs  for  the  hybrid  system  model  are  the  current  system  states  and  the 
control  parameters,  and  the  outputs  are  the  new  system  states.  Given  a  task,  an  ideal  trajectory  control  system  will 
generate  a  series  of  control  commands  to  achieve  a  desired  trajectory  under  various  disturbances  and  vehicle  model 
uncertainties  including  aerodynamic  uncertainties  resulting  from  geometric  damage  to  the  vehicle.  Conventional 
control  generation  approaches  suffer  from  the  nonlinearity  of  the  MIMO  system  and  the  high-dimensionality  of  the 
system  state  space. 

This  paper  is  organized  as  follows.  Section  2  introduces  the  NDO  approach  and  discusses  the  implementation 
issues.  Section  3  describes  the  reentry  vehicle  dynamics  and  the  path  constraints.  Section  4  presents  the 
experimental  results.  Section  5  concludes  the  paper  and  discusses  the  future  research  direction. 

II.  Neural  Dynamic  Optimization 

In  this  section  we  present  the  basics  of  Neural  Dynamic  Optimization. 

A.  NDO  Formulation 

For  an  optimal  control  problem,  we  seek  to  find  a  trajectory  for  a  dynamical  system  that  minimize  a  penalty 
functional  J  that  maps  a  Lagrangian  function  defined  over  the  entire  path  to  a  real  number.  Let  a  given  discrete 
nonlinear-time-invariant  (NLTI)  multi-input-multi-output  (MIMO)  system  be  defined  as: 

x[k  +  l]  =  f(x[k\,u[k]),  (1) 

where  x[k]GR"  is  the  state  vector,  and  u\k]&Rm  is  the  control  vector.  The  penalty  function  be  defined  as: 


N- 1 

J =cf)(x[N'\)+'y'T(x\k'\,u[k]) , 

k= 0 


(2) 
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where  function  penalizes  the  final  state  x[./V],  and  the  Lagarngian  function  r(-)  that  penalizes  the  state  x\k\ 
and  the  control  inputs  u[k]  through  k  =  0  to  N  - 1 . 

The  optimal  trajectory  design  problem  can  now  be  stated  as:  follows 

Given  an  initial  state  XQ  ,  find  an  optimal  set  of  controls!/  [&]*=o.y-i  over  TV  steps  that  minimizes  the  penalty  functional 

J  based  on  system  state  history,  final  system  state,  and  the  history  of  control  effort. 

Figure  1  illustrates  the  schematics  of  a  basic  NDO-based  optimizer,  which  consists  of  two  components:  a 
representative  system  model  for  the  given  dynamical  system;  and  a  neural  net  model  representing  the  feedback 
controller.  In  many  applications,  the  exact  system  function  may  be  highly  non-linear  and  complex,  and  hard  to 
obtain  in  analytic  form.  Hence  we  use  a  representative  system  model  /(•)  that  approximates  the  true  system 
dynamics  function /(•)•  The  system  model  /'(•)  projects  the  current  states  to  the  system’s  new  state  based  on  the 
current  state  and  the  control  inputs.  The  controller  g(  )  generates  a  control  vector  based  on  the  current  system  state. 
The  controller  and  the  system  simulator  are  coupled  so  that  they  can  generate  a  series  of  states  and  controls  that 
guides  the  system  in  optimal  manner  to  the  final  desired  state  x[iV] . 


Figure  1.  Diagram  of  an  NDO  based  optimizer. 


The  challenge  of  developing  an  NDO-based  controller  is  to  develop  a  high-performance  MIMO  controller  to 
handle  the  complexities  arising  due  to  nonlinearity  of  the  plant  system.  It  is  expected  that  optimal  trajectories  for 
non-linear  system  will  have  a  non-linear  controller  with  respect  to  the  system  states.  We  take  advantage  of  the  neural 
networks,  structures  of  massively  connected  computational  units,  to  handle  this  complexity.  With  non-linear 
activation  function,  the  network  has  the  ability  to  generalize  model  optimal  controller  as  a  function  of  system  states. 
The  learning  capability  of  a  neural  network g(x\k\,  W)  is  governed  by  its  structure  and  the  values  of  its  weight 
matrix,  denoted  as  W.  Suppose  a  full  connected  three-layer  (one  input  layer,  one  hidden  layer,  one  output  layer) 
multilayer  perception  (MLP)  is  selected;  the  controller  designing  is  reduced  to  finding  an  effective  means  to  fine 
tune  W. 

In  the  following  part  of  this  section,  we  will  discuss  how  to  train  the  neural  network  based  on  optimal  control 
theory  so  that  its  output  will  fulfill  the  control  goal. 

B.  Training  an  NDO  model 

Consider  that  the  initial  system  state  x0  lies  in  a  state  space  described  by  a  probability  distribution  P(x0 )  ,  the 
NDO  based  trajectory  design  solution  can  be  described  as: 

Minimize  cost  function  J  defined  in  Eqn.  (1)  with  respect  to  weights  W subject  to  the  constraints 

x[k  +  l\  =  f(x[k],u[k]),  k=0,...,N-l,  (3) 

where  x[0]=x0  ~P(x0)  (4) 

and  u[k\  =  g(x[k\,W),  k-0,...,N -l  (5) 

Note  that  both  the  system  equation  Eq.  (3)  and  control  equations  Eq.  (5)  are  treated  as  constraints.  We  solve  the 
constrained  optimization  problem  using  Euiler  Lagrange  equations.  First,  we  construct  the  augmented  cost  function 
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by  adding  constraints  using  Lagrange  multipliers  A  \k\  and  A  \k\  ■  Note  that  the  Lagrange  variables  are  also  known 
as  co-states  of  the  adjoint  system.  The  augmented  penalty  function  is  given  as: 


+Xr  (*[*],«[*])  +^[o](x0-x[o]) 

k=0 

+  Xa:  [k  +  l](f(x[k],u[k])-x[k  + 1])  +YK  [kl(g(x[kW)  -  u[k ]) 

t=0  4=0  (5) 

The  solution  to  the  constrained  optimization  is  determined  by  the  saddle  point  of  the  augmented  penalty 
function  J(x[k\,u[k],W,Ax[k],Au[k]) ,  which  must  be  minimized  with  respect  to  W ,  x[/c] ,  n[k\,  A x [k ]  and 

A  [&]  •  Note  that  for  optimal  solution,  the  first  variation  of  the  augmented  penalty  function  j(-)  must  be  zero.  Thus, 

by  differentiating  J{x[k\,u[k\,Ax[k\,Au[k\,W)  with  respect  to  W  ,x[k],  u[k],  Ax[k\  and Au [k\,  and  setting  the 

results  equal  to  zero  we  get  the  following  conditions  for  the  optimality.  Note  that,  the  first  variation  with  respect  to 
Lagrange  multipliers  Ax\_k\  and  A ,  \k\  returns  the  constraint  equations  given  by  Eqs.  (3)  and.  (5)  respectively. 

Whereas  the  first  variation  with  other  parameters  yields  the  following  conditions: 

Condition  1:  First  variation  with  respect  to  the  state  variables  x[A:]  is  given  as: 


dJ  (x[k] ,  u[k],Ax  [k],  A„  [k],W) 
dx [k\ 


0,  ifc=0,...,JV-l, 


which  yields  the  customary  costate  equations  for  an  adjoint  discrete  system  defined  as 


dT([k],u[klk ) 


ATx[k]=- KL\i’“Ly’*j+%[k+ ir 


df([k],u[k],k) 


dx[k\ 


dx[k\ 


dg(x[k]-W ) 
dx[k] 


,  k=0,...,N  —  l  ■ 


(7) 


(8) 


In  the  above  equation  we  notice  the  coupling  of  costates  Ar  corresponding  to  system  states  x ,  with  costates  Au  that 
corresponds  to  control  parameters  u.  The  final  boundary  conditions  for  the  above  adjoint  system  are  determined 
from  the  first  variation  with  respect  to  x[7V]  as 


dJ(x[Nlu[NlAx[NlAu[NlW)  _ 
dx[N] 

which  yields  AJ\N~\  _  d^(x[W])  . 

dx[N] 

Condition  2:  First  variation  with  respect  to  the  control  parameters  u[k] : 


(9) 

(10) 


dJ{x\k],u\k],Ax\k],Au\_k\,W) 

du[k\ 


k=0,...,N -1, 


which  yields  the  other  adjoint  system  with  costates  A 


(11) 


ou[k  J 


df([k],u[k],k) 
du[k ] 


k=0,...,N -l 


(12) 
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This  adjoint  similar  is  similar  to  conventional  adjoint  system  in  optimal  control  problems  and  it  is  the  artifact  of  the 
constraints  placed  on  control  parameters  by  the  controller  function. 

Condition  3:  First  variation  with  respect  to  the  neural  net  weights  W  gives  the  optimality  conditions  given  as 


dJ(x[k],u[k],W,Ax[k],Au[k]) 

dW 


k=0,...,N-l, 


(13) 


which  yields  V ^ [£]  =  o,  k  =  0,. .  .,N - 1  •  (14) 

*3>  dW 

The  optimality  conditions  determines  the  neural  net  weight  W  as  function  of  the  history  of  costates  A  ■ 

We  use  a  stochastic  steepest  decent  algorithm  to  find  an  optimal  weight  matrix  W.  Fig.  2  summarizes  this 
process. 


Initialize  time  horizon  N,  a  learning  rate  //,.  an  initial  guess  of  W ; 

Define  penalty  functions  </>(•)  and  r ( - ) ; 

1.  Piek  a:[0]  based  on  Eqn.  (3); 

2.  For  k  =  0, . . . , IV  —  1,  compute  x[k  +  1]  and  u[fc]  based  on  Eqn.  (2)  and  (4); 

3.  Compute  A^[1V]  based  on  Eqn.  (7); 

4.  For  k  =  N— 1, . . . ,  1,  compute  Ajj/V]  and  X^[k]  based  on  Eqn.  (6)  and  (8)  respectively; 

5.  Update  W:  W  i- W  —  (iEk=o  AJl*]*^531; 

6.  Go  back  to  setp  1  or  output  W  if  stop  criterion  is  reached. 

Fig.  2.  Pseudo  Code  of  Stochastic  Steepest  Descent  Algorithm  for  NDO 


C.  NDO  model  for  infinite  horizon  problems 

In  the  previous  sections  we  discussed  NDO  models  with  finite  time  horizon  N.  The  design  of  an  NDO  following 
that  scheme  does  not  support  solving  infinite  time  horizon  problems  directly.  Flowever,  one  can  solve  such  problems 
using  such  an  NDO  approach  by  assigning  N  a  sufficiently  large  value.  If  N  is  sufficiently  large,  the  solution  to  the 
finite  horizon  problem  converges  to  the  solution  to  the  corresponding  infinite  horizon  problem.  Ref.  [2]  provides 
detailed  analysis  on  how  to  set  the  initial  neural  network  structure,  and  Ref.[3]  illustrates  NDO  with  an  infinite 
horizon  problem. 


III.  System  Model 

The  following  section  defines  the  dynamics  of  the  reentry  vehicles. 

A.  System  Functions 

The  vehicle  model  used  in  this  paper  is  a  vertical  takeoff,  horizontal  landing,  winged-body,  and  unmanned  craft 
studied  at  the  NASA  Marshall  Space  Flight  Center5.  We  are  interested  in  determining  a  re-entry  trajectory  that 
brings  the  vehicle  from  the  entry  point  of  upper  atmosphere  to  a  comparatively  low  altitude  with  a  relative  low 
velocity.  More  specifically,  under  normal  conditions,  the  vehicle  enters  the  atmosphere  at  an  altitude  of  around  120 
km  with  a  speed  of  approximately  7450  m/s.  The  intended  goal  is  to  determine  a  feasible  trajectory  guiding  the 
vehicle  to  reach  to  an  altitude  around  25  km  with  a  speed  of  approximately  750  m/s  subject  to  any  midpoint 
trajectory  feasibility  constraints.  Table  1  summarizes  the  symbols  used  in  the  paper  regarding  the  vehicle  model  and 
the  guidance  problem. 
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Table  1.  List  of  symbols  and  the  notations 


Symbol 

Notation 

r 

radial  distance  from  the  center  of  the  Earth  to  the  vehicle, 
normalized  by  Re 

V 

Earth-relative  velocity,  normalized  by  fgo Rf 

7 

flight  path  angle  (degree) 

D 

normalized  drag  in  g  (acceleration  due  to  gravity) 

L 

normalized  lift  in  g 

cL 

lift  coefficient 

CD 

drag  coefficient 

a 

angle  of  attack  (degree) 

(7 

bank  angle  (degree) 

m 

mass  of  the  vehicle  (104,305  kg) 

S 

vehicle  reference  area  (391.22  m2) 

Re 

Earth  radius  (6,378  km) 

.90 

acceleration  due  to  gravity  at  sea  level  (9.81m/s2) 

P 

air  density,  p{l-(25.512r-25.432)  cos[27r(129.96-127.56r)]}, 

where  p  =  1.752fi-^(r-i)/67ookg/,m3 

In  this  example,  only  the  normalized  longitudinal  dynamics  of  the  vehicle  are  studied.  The  2-DOF  longitudinal 
dynamics  of  the  system  are  modeled  using  normalized  states  r,  V,  and  y .  The  non-linear  longitudinal  dynamics  of 
the  system  is  defined  by  the  following  system  functions: 


r  =  Esin  y 


(15) 


V  =  -D  -  siny  /  r2 


(16) 


y  =  (V2  -l/r)cosy/(Vr)  +  (D/V)(CL/CD)cos<7, 


(17) 


where  the  drag  D  ,  the  drag  and  lift  coefficients  CL  and  CD  are  calculated  from  the  following  equations: 


C,  =  -0.041065  +  0.016292  a  +  0.00026024  a 2 


(18) 


CD  =  0.080505-0.03026  C,  +0.86495  C2 


(19) 


D  -  0.5  pV2ReSCD  /  m 


(20) 


where  angle  of  attack  a  is  in  degrees.  For  this  experiment,  an  exponential  atmosphere  was  considered  that  matches 
quite  closely  to  the  1976  Standard  Atmosphere  in  the  altitude  regions  of  interest  in  this  effort.  The  exponential 
expression  for  atmospheric  density  p  is  represented  as 


p-  1.752e  6700  (kg/m3).  (21) 

The  first  of  the  three  system  equations  Eq.  (15)  defines  the  kinematics,  while  the  other  two  equations  Eqs.  (16) 
and  (17)  model  the  acceleration  based  on  aerodynamic  forces  such  as  Lift  and  Drag.  The  two  control  inputs  are 
angle  of  attack  a  and  bank  angle  cr ,  which  influence  the  Drag  force  and  the  direction  of  Lift  force  acting  on  the 
vehicle.  The  task  of  the  trajectory  design  is  to  generate  a  series  of  a  and  cr  so  that  the  vehicle  can  reach  the  desired 
state. 
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B.  System  Constraints 

There  are  certain  constraints  that  must  be  considered  to  design  a  feasible  trajectory.  For  example,  when  the 
vehicle  is  at  a  very  high  altitude  with  a  high  speed,  a  large  a  is  preferred  so  that  the  vehicle  can  use  its  bottom 
surface,  which  is  heat  resistant,  to  avoid  vehicle  damage.  The  constraints  are  as  follows: 

1)  Heating  rate  constraint: 


4p(^fofV3<  3.305xl09 


(22) 


2)  Load  factor  constraint  in  body-normal  direction: 

Zcosa  +  Z^sina^  2(g) 


where  L  is  the  dimensionless  lift  acceleration  in  g : 

L  =  0.5 pV2ReSCL  / m 


(23) 


(24) 


3)  Dynamic  pressure  constraint: 


D<q max SCD  /(mg 0 ) 

where  qmax  =  16,280  N/m2  is  the  maximum  allowable  dynamic  pressure. 


(25) 


IV.  Experimental  Results 

Based  on  the  system  functions  and  the  constraints,  we  formulate  the  components  of  Eqn.  (1)  as: 

mm)  =  Qr  [N](?iN]  -rdf  +  Qv  [/V](L[/V]  -  Vd  )2  +  Qr  (y[N]  -yd)2 

+Qh(Jp(4^YnNf -3.305xl09)2  f  (20) 

+Q ,  (L[N]cos(a[N  - 1])  +  £>[)V]sin(a[(V  - 1])  -  2)2 
+  Qd(D[N}-qm.i%SCn[N]/(mg0))2 

and  r(jt[0],n[0])=0;  fork  =  1, 

T(x[k],  u[k ])  =  Qr  [ k](r[k ]  -  rd  )2  +  Qv  [k](V[k]  -  Vd  )2  +  Qy  (y[k\  -ydf 

+Qa(a[k]~ad)2  +Qaa\kf 

-3.305X109)2  ’  (21) 

+Q,  (L[k]  cos(or[£  - 1])  +  D[k]  sin(«[A:  - 1])  -  2)2 
+  Qd(D[k]-qmiaSCD[k]/(mg<)))2 

subject  to 

14°  <  «<40°, 

0°  <o-<80°, 

where  the  final  desired  normalized  \rd  Vd  yd  \  have  the  values  [1.0039197  0.0948  0]T  corresponding  to  an 
altitude  of  120  km,  a  velocity  of  750  m/s,  and  a  zero  flight  path  angle;  Qr,Qv,Q  ,Qa,Qa,Qh,Qi,Qd  are  the 

penalty  factors  with  respect  to  altitude,  velocity,  flight  path  angle,  angle  of  attack,  bank  angle,  heat  constraint,  load 
factor  constraint,  and  dynamic  pressure  constraint. 
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During  training,  the  initial  states  are  generated  by  uniformly  and  randomly  selecting  the  altitude,  velocity,  and 
flight  path  angle  in  the  intervals  [25,  120]  km,  [750,  7500]  m/s,  and  [-10,  0],  respectively.  The  time  interval  for 
consecutive  samples  is  5  seconds.  The  values  of  Q,.,QV ,Qy,Qa,Qa,  Qh,  Q:,Qd  are  1:  0.1,  0.1,  0.1,  0.05,  0.1,  0.1, 

0.1,  respectively.  Since  this  is  an  infinite  horizon  problem,  a  relatively  large  value,  100,  was  set  for  /V.  The  controller 
MLP  has  15  hidden  nodes;  its  learning  rate  is  le-7.  The  experimental  results  are  displayed  in  Fig.  3-5.  This  figure 
shows  the  vehicle  states  and  the  controls,  and  also  the  horizontal  distance  (labeled  as  distance)  the  vehicle  has 
traveled.  This  distance  can  be  calculated  based  on  the  kinematic  relation 

s  =  Vcosy-  (26) 

Fig.  3  shows  the  trajectory  of  the  vehicle  from  the  same  entry  point  as  used  in  [4].  The  vehicle  successfully 
descends  to  the  desired  state  within  around  1000  seconds.  To  further  evaluate  the  capability  of  the  NDO,  the  NDO 
was  tested  with  some  cases  that  it  had  not  “seen”  during  training.  Fig.  4  shows  trajectories  of  the  vehicle  with  four 
different  initial  states.  For  all  the  cases,  the  initial  altitude  (50  km)  and  velocity  (3000  m/s)  are  the  same;  however, 
the  flight  path  angles  are  0°,-2°,5°,  and -12°,  respectively.  Since,  during  the  training  phase,  all  the  flight  path 
angles  are  from  the  interval  [  - 10° ,  0°  ],  the  third  and  fourth  cases  are  challenges  for  the  NDO.  In  Case  3  it  takes  the 
vehicle  a  much  longer  time  to  reach  the  desired  state  since  the  initial  motion  is  directed  away  from  the  final  desired 
goal.  Case  4  is  also  very  challenging  as  the  vehicle  is  dying  toward  the  desired  altitude  with  an  unsustainable  flight 
path,  resulting  in  higher  heating  due  to  the  higher  velocity  at  more  dense  lower  altitudes.  Without  effective  controls, 
the  vehicle  will  have  reached  the  desired  altitude,  but  with  a  much  higher  velocity  than  what  is  desired.  Naturally, 
the  vehicle  must  be  directed  to  dy  upward  to  reduce  the  speed  of  the  vehicle,  and  then  decend  toward  the  final 
desired  goal.  The  NDO  controller  was  not  trained  with  such  a  strategy,  as  the  training  is  based  on  the  initial  state 
alone.  Flowever,  Fig.  4  shows  that  the  NDO  successfully  meets  this  difficult  challenge  and  the  trajectory  of  Case  4  is 
on  the  expected  lines  for  a  feasible  solution.  In  summary,  the  NDO  controller  successfully  generates  a  feasible  path 
for  the  vehicle  to  the  desired  final  state  from  all  four  initial  conditions. 


Fig.  3.  Trajectory  of  a  Vehicle  Entering  the  Atmosphere  at  120  km  at  the  Speed  of  7450  m/s,  and  0  deg  Flight 
Path  Angle. 
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Finally,  to  evaluate  the  robustness  of  the  NDO  controller,  it  is  used  to  design  a  trajectory  for  a  damaged  vehicle. 
The  damaged  vehicle  was  modeled  with  a  5%  reduction  in  the  mass  of  the  vehicle  and  a  loss  of  30%  of  the  reference 
surface  area.  In  this  example,  the  NDO  controller  faces  a  difficult  challenge  as  the  system  dynamics  are  altered 
significantly  and  the  NDO  controller  has  no  prior  knowledge  of  the  change.  Notice  that  the  feedback  response  of  the 
vehicle  to  the  control  commands  through  the  measured  state  is  the  only  indication  for  the  alteration  of  the  system 
dynamics.  Once  again,  we  observe  in  Fig.  5  that  the  NDO  is  capable  of  generating  a  feasible  path  for  a  significantly 
damaged  vehicle  to  the  desired  state. 


time  (sec) 


time  (sec) 


Fig.  4.Trajectories  of  a  Vehicle  with  Identical  Entry  Altitude  and  Velocity,  but  Different  Flight  Path  Angles 
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Fig.  5.  Trajectories  for  a  Normal  Vehicle  and  a  Damaged  Vehicle. 

V.  Conclusion 

In  this  paper,  we  have  explored  the  theory  of  Neural  Dynamic  Optimization  (NDO)  and  applied  it  to  a  difficult 
trajectory  design  problem  for  a  RLV  to  design  feasible  reentry  trajectories.  Our  experimental  results  show  that  a 
properly  trained  NDO  robustly  operates  for  a  wide  range  of  input  conditions  and  generates  feasible  and  acceptable 
trajectories.  The  results  demonstrate  that  a  feasible  solution  for  the  cases  can  be  found  by  the  NDO  controller  even 
though  these  have  not  bee  included  in  its  training.  NDO  offers  promise  as  the  basis  for  real-time  aerial  vehicle 
trajectory  design  and  reshaping  is  desired  for  a  class  of  RLVs  in  order  to  achieve  more  flexibility  for  autonomous 
operations.  However,  drawbacks  were  also  noticed  in  NDO  applications.  First,  the  NDO  solution  is  not  a  complete 
dynamic  programming  solution.  This  means  that  local  as  well  as  global  optima  are  possible.  Second,  tuning  the 
weight  updating  process  (by  tuning  parameters  such  as  learning  rate,  penalty  factors,  etc.)  is  not  a  trivial  task;  and 
requires  both  detailed  analysis  of  the  system  function,  the  penalty  function,  and  also  experience  in  neural  network 
training.  Third,  the  training  process  is  slow  since  the  weight  updating  follows  a  sequential  mode.  In  other  words, 
advanced  MLP  training  techniques,  which  can  update  weights  very  fast  by  analyzing  a  batch  of  data  points,  cannot 
be  easily  applied. 

The  directions  we  have  set  for  our  future  work  are  as  follows.  First,  we  need  to  compare  NDO  with  other  control 
methods  [5];  second,  we  need  to  study  how  to  adjust  the  training  parameters  more  systematically;  and  third,  we  want 
to  expand  the  vehicle  model  to  incorporate  lateral  motion  to  design  a  more  realistic  3-D  trajectory. 
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